10 research outputs found
An evaluation of DGA classifiers
Domain Generation Algorithms (DGAs) are a popular technique used by contemporary malware for command-and-control (C&C) purposes. Such malware utilizes DGAs to create a set of domain names that, when resolved, provide information necessary to establish a link to a C&C server. Automated discovery of such domain names in real-time DNS traffic is critical for network security as it allows to detect infection, and, in some cases, take countermeasures to disrupt the communication and identify infected machines. Detection of the specific DGA malware family provides the administrator valuable information about the kind of infection and steps that need to be taken. In this paper we compare and evaluate machine learning methods that classify domain names as benign or DGA, and label the latter according to their malware family. Unlike previous work, we select data for test and training sets according to observation time and known seeds. This allows us to assess the robustness of the trained classifiers for detecting domains generated by the same families at a different time or when seeds change. Our study includes tree ensemble models based on human-engineered features and deep neural networks that learn features automatically from domain names. We find that all state-of-the-art classifiers are significantly better at catching domain names from malware families with a time-dependent seed compared to time-invariant DGAs. In addition, when applying the trained classifiers on a day of real traffic, we find that many domain names unjustifiably are flagged as malicious, thereby revealing the shortcomings of relying on a standard whitelist for training a production grade DGA detection system
Inline detection of DGA domains using side information
Malware applications typically use a command and control (C&C) server to manage bots to perform malicious activities. Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names that can be used to establish a communication between an infected bot and the C&C server. In recent years, machine learning based systems have been widely used to detect DGAs. There are several well known state-of-the-art classifiers in the literature that can detect DGA domain names in real-time applications with high predictive performance. However, these DGA classifiers are highly vulnerable to adversarial attacks in which adversaries purposely craft domain names to evade DGA detection classifiers. In our work, we focus on hardening DGA classifiers against adversarial attacks. To this end, we train and evaluate state-of-the-art deep learning and random forest (RF) classifiers for DGA detection using side information that is harder for adversaries to manipulate than the domain name itself. Additionally, the side information features are selected such that they are easily obtainable in practice to perform inline DGA detection. The performance and robustness of these models is assessed by exposing them to one day of real-traffic data as well as domains generated by adversarial attack algorithms. We found that the DGA classifiers that rely on both the domain name and side information have high performance and are more robust against adversaries
CharBot: A Simple and Effective Method for Evading DGA Classifiers
Domain generation algorithms (DGAs) are commonly leveraged by malware to
create lists of domain names which can be used for command and control (C&C)
purposes. Approaches based on machine learning have recently been developed to
automatically detect generated domain names in real-time. In this work, we
present a novel DGA called CharBot which is capable of producing large numbers
of unregistered domain names that are not detected by state-of-the-art
classifiers for real-time detection of DGAs, including the recently published
methods FANCI (a random forest based on human-engineered features) and LSTM.MI
(a deep learning approach). CharBot is very simple, effective and requires no
knowledge of the targeted DGA classifiers. We show that retraining the
classifiers on CharBot samples is not a viable defense strategy. We believe
these findings show that DGA classifiers are inherently vulnerable to
adversarial attacks if they rely only on the domain name string to make a
decision. Designing a robust DGA classifier may, therefore, necessitate the use
of additional information besides the domain name alone. To the best of our
knowledge, CharBot is the simplest and most efficient black-box adversarial
attack against DGA classifiers proposed to date
CharBot : a simple and effective method for evading DGA classifiers
Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names, which can be used for command and control (C&C) purposes. Approaches based on machine learning have recently been developed to automatically detect generated domain names in real-time. In this paper, we present a novel DGA called CharBot, which is capable of producing large numbers of unregistered domain names that are not detected by state-of-the-art classifiers for real-time detection of the DGAs, including the recently published methods FANCI (a random forest based on human-engineered features) and LSTM.MI (a deep learning approach). The CharBot is very simple, effective, and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to make a decision. Designing a robust DGA classifier may, therefore, necessitate the use of additional information besides the domain name alone. To the best of our knowledge, the CharBot is the simplest and most efficient black-box adversarial attack against DGA classifiers proposed to date
Analyzing the Real-World Applicability of DGA Classifiers
Separating benign domains from domains generated by DGAs with the help of a
binary classifier is a well-studied problem for which promising performance
results have been published. The corresponding multiclass task of determining
the exact DGA that generated a domain enabling targeted remediation measures is
less well studied. Selecting the most promising classifier for these tasks in
practice raises a number of questions that have not been addressed in prior
work so far. These include the questions on which traffic to train in which
network and when, just as well as how to assess robustness against adversarial
attacks. Moreover, it is unclear which features lead a classifier to a decision
and whether the classifiers are real-time capable. In this paper, we address
these issues and thus contribute to bringing DGA detection classifiers closer
to practical use. In this context, we propose one novel classifier based on
residual neural networks for each of the two tasks and extensively evaluate
them as well as previously proposed classifiers in a unified setting. We not
only evaluate their classification performance but also compare them with
respect to explainability, robustness, and training and classification speed.
Finally, we show that our newly proposed binary classifier generalizes well to
other networks, is time-robust, and able to identify previously unknown DGAs.Comment: Accepted at The 15th International Conference on Availability,
Reliability and Security (ARES 2020
Hardening Inline DGA Classifiers Against Adversarial Attacks
Thesis (Master's)--University of Washington, 2019Domain Generation Algorithms (DGAs) are widely used by cybercriminals to generate domain names on-the-go for C&C (command-and-control) purposes of establishing communication with the bots and instructing them to perform malicious activities. It is therefore important to detect domains generated by DGAs to block the communication between the bot and C&C. In recent years, Machine Learning based DGA detection systems are widely used to address this problem. However, it is found that classifiers that rely only on the domain name to detect DGAs are highly vulnerable to adversarial attacks. Adversarial attacks are intentionally devised by an attacker to fool a classifier and cause it to produce erroneous results. This is a serious concern as it degrades the performance of DGA detection classifiers. In this thesis, we aim to defend DGA detection classifiers against adversarial attacks, without compromising the performance of existing state-of-the-art classifiers in the literature. One such technique is to use side information features obtained from the DNS query/response that cannot be easily manipulated by the adversary. Although there are past research works that use DNS features for a retrospective analysis of DNS traffic, to the best of our knowledge, there are no studies that leverage such data for inline detection of DGA domains. In our work, we train machine learning models based on tree ensembles and deep learning for DGA detection using side information (in addition to the domain name), which can be easily obtained in practice without relying on external data sources such as WHOIS. Besides, we also disregard methods that analyze past DNS data to extract side information features, thereby resulting in a relatively lightweight computation for detecting DGA domains in real-time DNS applications. In the end, we also perform an empirical evaluation by applying the best performing classifiers trained using side information on one day of passive DNS traffic to compare its performance against well known state-of-the-art classifier that relies only on a domain name for DGA detection. Results show that classifiers trained using a combination of lexical and side information features, not only provide high performance but are also more robust to adversarial attacks than the classifiers that rely only on the domain name for inline DGA detection
Inline Detection of DGA Domains Using Side Information
Malware applications typically use a command and control (C&C) server to manage bots to perform malicious activities. Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names that can be used to establish a communication between an infected bot and the C&C server. In recent years, machine learning based systems have been widely used to detect DGAs. There are several well known state-of-the-art classifiers in the literature that can detect DGA domain names in real-time applications with high predictive performance. However, these DGA classifiers are highly vulnerable to adversarial attacks in which adversaries purposely craft domain names to evade DGA detection classifiers. In our work, we focus on hardening DGA classifiers against adversarial attacks. To this end, we train and evaluate state-of-the-art deep learning and random forest (RF) classifiers for DGA detection using side information that is harder for adversaries to manipulate than the domain name itself. Additionally, the side information features are selected such that they are easily obtainable in practice to perform inline DGA detection. The performance and robustness of these models is assessed by exposing them to one day of real-traffic data as well as domains generated by adversarial attack algorithms. We found that the DGA classifiers that rely on both the domain name and side information have high performance and are more robust against adversaries